-
Notifications
You must be signed in to change notification settings - Fork 29
INTPYTHON-751 Make query generation omit $expr unless required #396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
529e0ff
to
a78f26b
Compare
8b0c247
to
141f1cf
Compare
d11378a
to
2c48d11
Compare
Substr.as_mql = substr | ||
Trim.as_mql = trim("trim") | ||
TruncBase.as_mql = trunc | ||
Cast.as_mql_expr = cast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the function does not support as_mql_path
. It could be added latter if we try to simplify constants expressions
5827580
to
5b9fa93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left my first round of comments. I'm going to review the tests as a second phase of the PR.
def as_mql_expr(self, compiler, connection): | ||
lhs_mql = process_lhs(self, compiler, connection, as_path=False) | ||
value = process_rhs(self, compiler, connection, as_path=False) | ||
return {"$gte": [lhs_mql, value]} | ||
|
||
def as_mql_path(self, compiler, connection): | ||
lhs_mql = process_lhs(self, compiler, connection, as_path=True) | ||
value = process_rhs(self, compiler, connection, as_path=True) | ||
return {lhs_mql: {"$gte": value}} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a $gte
query in a search.text
lookup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To convert a score function into a filter I decided to express the following proposition: score_func(...) > 0
.
return self.is_simple_column | ||
|
||
@cached_property | ||
def is_simple_column(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this validation for embedded models used in multiple places. Can you consolidate this to the query_utils file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 will try. The overall structure of the function is similar, but the type varies. I could create a meta-function that generates the appropriate function for a given type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get better. 😬 . I will let as it is. there is some details like:
previous._field.column
, previous.key_name
when extract the field_name or the path. Then are different from EMFA and EMF the return, one has to validate the inner transform while the other doesn't
|
||
|
||
def valid_path_key_name(key_name): | ||
return bool(re.fullmatch(r"[A-Za-z0-9_]+", key_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.mongodb.com/docs/manual/core/dot-dollar-considerations/
Values like hashtags are also valid for path names and don't require $expr
. To my knowledge so long as it's not (.) or ($) it's good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will adjust the expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if there is some emoji or some non ascii character? 🤔
def as_mql_expr(self, compiler, connection): | ||
columns, parent_field = self._get_target_path() | ||
mql = parent_field.as_mql(compiler, connection) | ||
for key in columns: | ||
mql = {"$getField": {"input": mql, "field": key}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially out of scope:
https://github.com/mongodb/django-mongodb-backend/pull/392/files#diff-0a6ce30a131a00fa88086c4c4d0d6e6232845fd11ef2bc67891fdf92e10c3743R18-R45
Is it possible to still remove $getField
in as_mql_expr
or is it expected that routing to as_mql_expr
for embedded model queries is because of needing a getField
call?
django_mongodb_backend/lookups.py
Outdated
@property | ||
def can_use_path(self): | ||
simple_column = getattr(self.lhs, "is_simple_column", False) | ||
constant_value = is_constant_value(self.rhs) | ||
return simple_column and constant_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
self.assertAggregateQuery( | ||
query, | ||
"model_fields__nullableintegerarraymodel", | ||
[{"$match": {"field": {"$in": ([1], [2])}}}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does $in now expect a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it wasn’t a new convention or expectation. There was already a test that checks the RHS $in
as a tuple, so I just followed that convention. It doesn’t affect the query behavior.
"$match": { | ||
"$expr": { | ||
"$eq": [ | ||
{"$getField": {"input": "$data", "field": "integer_"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of a value that could get rid of the getField.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could get rid of those getField if we use $data.integer_
but I thought it was out of scope for this refactor. This behavior is the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's out of scope!
[ | ||
{ | ||
"$match": { | ||
"$expr": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious to the callback chain on this one since the null-check could actually be converted.
May best be an improvement added later
966cf73
to
fdf7fda
Compare
aea2786
to
175563b
Compare
fe2cba2
to
6a5665c
Compare
if not valid_path_key_name(previous._field.column): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this could be described with a comment. What's an example of an invalid path key name? And valid_path_key_name could use a comment or docstring. I guess "user.address.city" is a path and "user", "address", etc. are considered "keys"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A json field could have some rare string as key fields. This test shows the case. Yes I called key the things between dots. 🤔 don't know if there is a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is EmbeddedModelTransform... not JSONField's KeyTranform. Is this tested? Tried model_fields_
but all passed with assert False
above the return False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I thought I added a test. I forgot to add, just we could define a column like:
wifi_column = models.IntegerField(db_column="$$Wifi")
.
And now I am thinking of the other models and lookups. will they work If the user start to define columns like that?
But maybe this analysis is out of scope of this ticket 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without tests, I'm not sure if there could be other issues. We should ask the team if we want to support it. It seems unlikely based on the discouragements described at https://www.mongodb.com/docs/manual/core/dot-dollar-considerations/#field-names-with-periods-and-dollar-signs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test added: test_query_price_column
. (should be polished but it shows the problem)
e4d5dff
to
e278c38
Compare
e278c38
to
8d6dd8e
Compare
Design Doc
In this PR a unified approach for generating MQL from Django expressions was implemented. The core idea is to centralize the control flow in a
base_expression
method, which decides whether the expression can be translated into a directfield: value
match (index-friendly) or must fall back to$expr
. This keeps the logic for wrapping and dispatching in one place, while each lookup/function only defines its own expression-building logic.This approach also allows mixing direct
field: value
matches with$expr
clauses within the same$match
. As a result, multiple$expr
entries may coexist alongside index-optimized conditions, depending on the shape of the query.Most lookups now follow this pattern by simply implementing
as_mql_expr
(and optionallyas_mql_path
when a match-based translation is possible). Only a few special cases likeCol
,Func
operators (except theKeyTransform
) , and many more, override the base behavior directly. This structure also leaves room for future optimizations (e.g. constant folding) without having to change the overall flow.Additionally, since MongoDB 6 does not allow nesting
$expr
inside another$expr
, the flow inbase_expression
ensures that such cases are flattened. In practice, expressions are generated without redundant wrapping, so the final MQL never contains$expr
within$expr
.